Applying Clickstream Data Mining to Real-Time Web Crawler Detection and Containment Using ClickTips Platform

نویسندگان

  • Anália Lourenço
  • Orlando Belo
چکیده

Web crawler uncontrolled widespread has led to undesired situations of server overload and contents misuse. Most programs still have legitimate and useful goals, but standard detection heuristics have not evolved along with Web crawling technology and are now unable to identify most of today’s programs. In this paper, we propose an integrated approach to the problem that ensures the generation of upto-date decision models, targeting both monitoring and clickstream differentiation. The ClickTips platform sustains Web crawler detection and containment mechanisms and its data webhousing system is responsible for clickstream processing and further data mining. Web crawler detection and monitoring helps preserving Web server performance and Web site privacy and clickstream differentiated analysis provides focused report and interpretation of navigational patterns. The generation of up-to-date detection models is based on clickstream data mining and targets not only well-known Web crawlers, but also camouflaging and previously unknown programs. Experiments with different real-world Web sites are optimistic, proving that the approach is not only feasible but also adequate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Heuristic-Regression Approach to Crawler Pattern Identification on Clickstream Data

Web robots, crawlers and spiders are software agents that visit Web sites periodically for multiple purposes. Usually, these activities impel the generation of additional clickstream and pattern data that will rise the necessity for extra processing and filtering. Robots are not conventional Web users. However, some of them intentionally pretend to be so. Their requests flood Web server logs, p...

متن کامل

A Data Warehouse/OLAP Framework for Web Usage Mining and Business Intelligence Reporting

Web usage mining is the application of data mining techniques to discover usage patterns and behaviors from web data (clickstream, purchase information, customer information etc) in order to understand and serve e-commerce customers better and improve the online business. In this paper we present a general Data Warehouse/OLAP framework for web usage mining and business intelligence reporting. W...

متن کامل

A data warehouse/online analytic processing framework for web usage mining and business intelligence reporting

Web usage mining is the application of data mining techniques to discover usage patterns and behaviors from web data (clickstream, purchase information, customer information, etc.) in order to understand and serve e-commerce customers better and improve the online business. In this article, we present a general data warehouse/online analytic processing (OLAP) framework for web usage mining and ...

متن کامل

INSITE: A Tool for Real-Time Knowledge Discovery from Users Web Navigation

The major challenges in web mining are a) tracking the data accurately (as not everything is reported to the web server), b) real-time acquisition of the huge volume of data (435 Million visits to yahoo per day, 2-4 GB clickstream data per hour), c) real-time interpretation of the data without compromising the privacy of the user (order of seconds for personalization and targeting information),...

متن کامل

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006